Introduction to Ambisonics¶

Fundamentals¶

Chris Hold, Virtual Acoustics (Aalto University)

Structure¶

  • Motivation
  • Discovering Spherical Harmonics
  • Ambisonics Encoder and Decoder

From Recording to Playback¶

Audio Description Formats¶

Channel Based Object Based Scene Based
Channel Object Scene

Benefits of Scene-based¶

  • Separation of recording, transmission / storage, and playback
  • Flexible with multiple options for each stage
  • Does not scale with the number of sources

Ambisonics¶

  • Implementation of a scene based format
  • "Long" history and connection to other sciences
  • Use spherical harmonics as basis functions

Example¶

2022-01-26T12:23:08.683452 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/
2022-01-26T12:23:10.184394 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/
2022-01-26T12:23:11.622985 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/
2022-01-26T12:23:12.994784 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

From the Wave Equation to Spherical Harmonics¶

Wave equation \begin{equation} \nabla^2 p - \frac{1}{c^2} \frac{\partial^2 p}{\partial t^2} = 0 \quad. \end{equation}

In frequency domain, with wave-number $k = \frac{\omega}{c}$ results in the Helmholtz equation \begin{equation} (\nabla^2 + k^2) p = 0 \quad. \end{equation}

Multiple solutions, e.g. a mono chromatic plane wave with amplitude $\hat{A}(\omega)$ in Cartesian coordinates \begin{equation} p(t, x, y, z) = \hat{A}(\omega) e^{i(k_x x + k_y y + k_z z - \omega t)} \quad. \end{equation}

2022-01-26T12:23:13.329393 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Any solution to the Helmholtz equation can also be expressed in spherical coordinates $$ p(r, \theta, \phi, k) = \sum_{n = 0}^{\infty} \sum_{m=-n}^{n} \color{darkgreen}{(A_{mn} j_n(kr) + B_{mn} y_n(kr))} \; \color{darkblue}{Y_{n}^{m}(\theta,\phi)} \quad, $$

$$ p(r, \theta, \phi, k) = \sum_{n = 0}^{\infty} \sum_{m=-n}^{n} \color{darkgreen}{C_{mn}(kr)} \; \color{darkblue}{Y_{n}^{m}(\theta,\phi)} \quad, $$

With two separable parts:

  • radial component : linear combination of spherical Bessel functions of first ($ j_n $) and second kind $ y_n $
  • angular component : spherical harmonics $ \color{darkblue}{Y_{n}^{m}(\theta,\phi)} $

sound field defined on a sphere is fully captured by its spherical harmonics coefficients

E.g. a unit plane wave \begin{equation} p(r, \theta, \phi, k) = \sum_{n = 0}^{\infty} \sum_{m=-n}^{n} 4 \pi i^n j_n(kr) \left[ Y_n^m(\theta_k,\phi_k) \right] ^* Y_{n}^ {m}(\theta,\phi) \quad. \end{equation}

Discovering Spherical Harmonics¶

$$ Y_n^m(\theta,\phi)=\color{darkorange}{\sqrt{\frac{2n+1}{4\pi}\frac{(n-m)!}{(n+m)!}}}\, \color{darkgreen}{P_n^m(\cos(\theta))}\, \color{darkred}{e^{im\phi}} \quad, $$$$ Y_n^m(\theta,\phi)=\color{darkorange}{D_{nm}}\, \color{darkgreen}{P_n^m(\cos(\theta))}\, \color{darkred}{e^{im\phi}} \quad, $$

where

  • azimuth as $\phi$ and zenith/colatitude as $\theta$.
  • $P_n^m$ is the associated Legendre polynomial of order $n$ and degree $m$.

Combined with appropriate scaling give real Spherical Harmonics $Y_{n,m}(\theta,\phi)$ as

$$ Y_{n,m}(\theta,\phi) = \sqrt{ \frac{(2n+1)}{4\pi} \frac{(n-|m|)!}{(n+|m|)!} } P_{n,|m|}(\cos(\theta)) \begin{cases} \sqrt2\sin(|m|\phi) & \mathrm{if\hspace{0.5em}} m < 0 \quad,\\ 1 & \mathrm{if\hspace{0.5em}} m = 0 \quad,\\ \sqrt2\cos(|m|\phi) & \mathrm{if\hspace{0.5em}} m > 0 \quad. \end{cases} $$
  • Always check: Condon–Shortley phase convention $(-1)^m$
2022-01-26T12:23:15.315047 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Let's look at the azimuthal component $e^{im\phi}$ and the zenithal component $P_n^m(\cos\theta)$

azimuthal component¶

2022-01-26T12:23:17.630087 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/
2022-01-26T12:23:18.636765 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

zenithal component¶

2022-01-26T12:23:19.229270 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Orthonormality¶

Two functions $\color{darkviolet}f, \color{darkred}g$ over a domain $\gamma$ are orthogonal if

$$ \int_\gamma \color{darkviolet}{f^*(\gamma)} \color{darkred}{g(\gamma)} \,\mathrm{d}\gamma = \langle \color{darkviolet}f, \color{darkred}g \rangle = 0 \quad,\,\mathrm{for}~f \neq g. $$

They are also orthonormal if $$ \color{darkviolet}{\int_\gamma f^*(\gamma) f(\gamma) \,\mathrm{d}\gamma = \int_\gamma|f(\gamma)|^2 \,\mathrm{d}\gamma = \langle f, f \rangle } = 1 \quad, \\ \color{darkred}{\int_\gamma g^*(\gamma) g(\gamma) \,\mathrm{d}\gamma = \int_\gamma |g(\gamma)|^2 \,\mathrm{d}\gamma = \langle g, g \rangle} = 1 \quad.$$

For the Spherical Harmonics:

  • the the azimuthal component $e^{im\phi}$ along $\phi$ is orthogonal w.r.t. the degree $m$
  • the zenithal component $P_n^m(\cos\theta)$ along $\theta$ is orthogonal w.r.t. the order $n$.

Their product is still orthogonal, and the scaling $\color{darkorange}{D_{nm}}$ ensures orthonormality such that $$ \int_\Omega Y_n^m(\Omega)^* \, Y_{n'}^{m'}(\Omega) \,\mathrm{d}\Omega = \langle Y_n^m(\Omega) , Y_{n'}^{m'}(\Omega) \rangle = \delta_{nn'}\delta_{mm'} \quad , $$

and

$$ \int_{{\Omega} \in \mathbb{S}^2} |Y_n^m({\Omega})|^2 \mathrm{d}{\Omega} = 1 \quad . $$

Example¶

show normality for $Y_0^0$: $$ Y_0^0(\theta,\phi)=\sqrt{\frac{0+1}{4\pi}\frac{(0)!}{(0)!}} P_0^0(\cos(\theta)) e^{i0\phi} = \sqrt{\frac{1}{4\pi}} \quad, $$ hence $$ \int_{{\Omega} \in \mathbb{S}^2} Y_0^0(\theta,\phi)^* Y_0^0(\theta,\phi) \,\mathrm{d}{\Omega} = \int_{{\Omega} \in \mathbb{S}^2} \sqrt{\frac{1}{4\pi}}\sqrt{\frac{1}{4\pi}} \, \mathrm{d}{\Omega} = 4\pi \frac{1}{4\pi} = 1 \quad. $$

Example¶

Test orthogonality of functions $\color{darkviolet}f, \color{darkred}g$

$$ \langle \color{darkviolet}f, \color{darkred}g \rangle \overset{?}{=} 0 \quad,\,\mathrm{for}~f \neq g. $$
Test for orthogonality in azimuth <f_azi, g_azi> =  0.0
Test for orthogonality in zenith <f_zen, g_zen> =  0.0
2022-01-26T12:23:19.566390 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Spherical Harmonic Transform (SHT)¶

  • Find a scene based encoding for sound field
  • We showed that the spherical harmonics $Y_n^m({\Omega})$ are a set of suitable basis functions on the sphere
  • We also showed that a sound field (on the sphere) $s({\Omega})$ is fully captured by its spherical harmonics coefficients $\sigma_{nm}$

This can be expressed with $\Omega = [\phi, \theta]$ as the inverse Spherical Harmonic Transform (iSHT) $$ s({\Omega}) = \sum_{n = 0}^{N=\infty} \sum_{m=-n}^{+n} \sigma_{nm} Y_n^m({\Omega}) \quad. $$

Spherical harmonics coefficients $\sigma_{nm}$ can be derived with the Spherical Harmonic Transform (SHT) $$ \sigma_{nm} = \int_{{\Omega} \in \mathbb{S}^2} s({\Omega}) [Y_n^m({\Omega})]^* \mathrm{d}{\Omega} = \langle [Y_n^m({\Omega})] , s({\Omega}) \rangle \quad. $$

  • SHT also referred to as spherical Fourier transform
  • Note the transform from a spatially discrete to continuous domain

Spherical Grids¶

The SHT evaluates the continuous integral over $\Omega$ $$ \sigma_{nm} = \int_{{\Omega} \in \mathbb{S}^2} s({\Omega}) [Y_n^m({\Omega})]^* \mathrm{d}{\Omega} \quad . $$ Quadrature methods allow evaluation by spherical sampling at certain (weighted) grid points such that \begin{equation} \sigma_{nm} \approx \sum_{q=1}^{Q} w_q s({\Omega}_q) [Y_n^m({\Omega}_q)]^* \quad. \end{equation}

Certain grids with sampling points $ {\Omega}_q $ and associated sampling weights $w_q$ have certain properties:

  • quadrature grids allow numerical integration of spatial polynomials
  • Spatial sampling dictates maximum order
  • easy to evaluate are uniform/regular grids, with constant $w_q = \frac{4\pi}{Q}$
  • An example are so-called spherical t-designs(t), which allow a SHT up to order $N = \lfloor t/2 \rfloor$
2022-01-26T12:23:19.764187 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/
2022-01-26T12:23:20.010312 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Spatial Dirac¶

We can show that spherical harmonics are orthogonal (even orthonormal) with $$ \int_\Omega Y_n^m(\Omega) \, Y_{n'}^{m'}(\Omega) \,\mathrm{d}\Omega = \delta_{nn'}\delta_{mm'} \quad . $$

Because of their completeness, we can also directly formulate a spatial Dirac function on the sphere as $$ \sum_{n=0}^{N=\infty} \sum_{m=-n}^n [Y_n^m({\Omega'})]^* Y_n^m(\Omega) = \delta(\Omega - \Omega') \quad, $$

and therefore the spherical Fourier coefficients $\sigma_{nm}$ $$ SHT\{\delta(\Omega - \Omega')\} = \int_{{\Omega} \in \mathbb{S}^2} \delta(\Omega - \Omega') \, [Y_n^m({\Omega})]^* \mathrm{d}{\Omega} = [Y_n^m({\Omega'})]^* \quad . $$

Order-Limitation of Spatial Dirac Pulse¶

2022-01-26T12:23:20.433111 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Example¶

Integrate (order-limited) Spatial Dirac $\delta_N(\Omega - \Omega')$ over sphere

$$ \int_{{\Omega} \in \mathbb{S}^2} \delta_N(\Omega - \Omega') \mathrm{d}{\Omega} = \int_{{\Omega} \in \mathbb{S}^2} \color{darkblue}{\sum_{n=0}^{N} \sum_{m=-n}^n\, [Y_n^m({\Omega'})]^* Y_n^m(\Omega)} \,\mathrm{d}{\Omega} \quad, $$

by discretization with sufficient t-design $$ \int_{{\Omega} \in \mathbb{S}^2} \color{darkblue}{\sum_{n=0}^{N} \sum_{m=-n}^n\, [Y_n^m({\Omega'})]^* Y_n^m(\Omega)} \,\mathrm{d}{\Omega} = \frac{4\pi}{Q}\sum_{q=1}^{Q} \color{darkorange}{\sum_{n=0}^{N} \sum_{m=-n}^n\, [Y_n^m({\Omega'})]^* Y_n^m(\Omega_q)} $$

2022-01-26T12:23:20.598733 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Matrix Notations¶

Stack the spherical harmonics evaluated at $\Omega$ up to spherical order $N$ as

$$ \mathbf{Y} = \left[ \begin{array}{ccccc} Y_0^0(\Omega[0]) & Y_1^{-1}(\Omega[0]) & Y_1^0(\Omega[0]) & \dots & Y_N^N(\Omega[0]) \\ Y_0^0(\Omega[1]) & Y_1^{-1}(\Omega[1]) & Y_1^0(\Omega[1]) & \dots & Y_N^N(\Omega[1]) \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ Y_0^0(\Omega[Q-1]) & Y_1^{-1}(\Omega[Q-1]) & Y_1^0(\Omega[Q-1]) & \dots & Y_N^N(\Omega[Q-1]) \end{array} \right] $$

such that $ \mathbf{Y} : [Q, (N+1)^2]$

  • soundfield pressure is real signal -> only real spherical harmonics needed
  • orthonormal scaling introduced earlier called N3D by convention (in contrast to SN3D)
  • stacking them as above, where $idx_{n,m} = n^2+n+m$, is called ACN

Obtaining the discrete signals $s_q(t)$ is a linear combination of SH basis functions evaluated at $\Omega_q$.

This inverse transform in matrix notation with ambisonic signals matrix $\mathbf{\chi} : [1, (N+1)^2]$ is

$$ s_q(t) = \mathbf{\chi}(t) \, \mathbf{Y}^T \quad . $$

We obtain ambisonic signals matrix $\chi : [1, (N+1)^2]$ from signals $\mathbf{S} : [1, Q]$ by SHT as

$$ \mathbf{\chi}(t) = \mathbf{S}(t) \, \mathrm{diag}(w_q) \, \mathbf{Y} \quad. $$

From the Ambisonics Encoder, to Spatial Weighting, to a Decoder

Encoder¶

  • We can stack a time signal $s(t_0, t_1, \ldots, t_n)$ as vector $\mathbf{s}$.
  • This leads to a matrix notation of multiple discrete signals $\mathbf{S} : [t, Q]$.
  • Similarly, we can stack the Ambisonics coefficients into matrix $\mathbf{\chi} : [t, (N+1)^2]$

A single plane wave encoded in direction $\Omega$ with signal $\mathbf{s}$ is directly the outer product with the spatial Dirac coefficients

$$ \mathbf{\chi}_{PW(\Omega)} = \mathbf{s} \, \mathbf{Y}(\Omega) \quad. $$

For multiple sources $Q$, we stack and sum $$ \mathbf{\chi}_{PW(\Omega_Q)} = \sum_{q=1}^Q\mathbf{s}_q \, \mathbf{Y}(\Omega_q) = \mathbf{S} \, \mathbf{Y} \quad. $$

Spatial Weighting¶

  • Directionally filtering a soundfield in direction $\Omega_k$ by spatial weighting
  • Weighting in SH domain $w_{nm}$ is an elegant way of beamforming

The simplest beamformer is a spatial Dirac in direction $\Omega_k$ normalized by its energy, i.e. $\mathrm{max}DI$ $$ w_{nm, \mathrm{max}DI}(\Omega_k) = \frac{4\pi}{(N+1)^2} Y_{n,m}(\Omega_k) $$

2022-01-26T12:23:21.296669 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Other patterns can be achieved by weighting the spherical Fourier spectrum. Axis-symmetric patterns reduce to only a modal weighting $c_n$, such that $$ w_{nm}(\Omega_k) = c_{n} \, Y_{n,m}(\Omega_k) $$

E.g. $\mathrm{max}\vec{r}_E$ weights each order with $$ c_{n,\,\mathrm{max}\vec{r}_E} = P_n[\cos(\frac{137.9^\circ}{N+1.51})] \quad, $$ with the Legendre polynomials $P_n$ of order $n$.

2022-01-26T12:23:22.675124 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Or we can define a (spatial) Butterworth filter with $$ c_{n,\,\mathrm{Butterworth}} = \frac{1}{\sqrt{1+(n/n_c)^{2k}}} \quad, $$ with the filter order $k$ and cut-on $n_c$.

2022-01-26T12:23:23.336927 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Decoder¶

  • Pointing a set of beamformers in directions $\Omega_k$ results in a set of signal components
  • in case of $\mathrm{max}DI$ proportional to iSHT in $\Omega_k$
$$ s({\Omega_k}) = \sum_{n = 0}^{N} \sum_{m=-n}^{+n} w_{nm}({\Omega_k}) \, \sigma_{nm} \quad, $$

or in matrix notation with $\mathbf{S} : [t, Q]$ and beamforming weights $\mathbf{c}_n$ $$ \mathbf{S} = \mathbf{\chi} \, \mathrm{diag_N}(\mathbf{c}_n) \, \mathbf{Y}^T \quad . $$

Example¶

Decode on a t-design(6) (sufficient up to $N = 3$):

  • a 3rd order ambisonic signal

Example¶

Decode on a t-design(6) (sufficient up to $N = 3$):

  • a 5th order ambisonic signal
  • an 8th order ambisonic signal

Loudspeaker Decoders¶

In [23]:
# Loudspeaker Setup
ls_dirs = np.array([[-135, -80, -45, 0, 45, 80, 135, -135, -60, -30, 30, 60, 135],
                    [0, 0, 0, 0, 0, 0, 0, 60, 60, 60, 60, 60, 60]])
ls_x, ls_y, ls_z = spa.utils.sph2cart(spa.utils.deg2rad(ls_dirs[0, :]),
                                      spa.utils.deg2rad(90 - ls_dirs[1, :]))
ls_setup = spa.decoder.LoudspeakerSetup(ls_x, ls_y, ls_z)
ls_setup.pop_triangles(normal_limit=85, aperture_limit=90,
                       opening_limit=150)
ls_setup.show()
Face not pointing towards listener: [0 1 2]
Face not pointing towards listener: [0 6 2]
Face not pointing towards listener: [2 5 6]
Face not pointing towards listener: [2 3 4]
Face not pointing towards listener: [2 5 4]
In [24]:
spa.plots.decoder_performance(ls_setup, 'NLS')
In [25]:
spa.plots.decoder_performance(ls_setup, 'VBAP')
In [26]:
spa.plots.decoder_performance(ls_setup, 'SAD')
In [27]:
spa.plots.decoder_performance(ls_setup, 'MAD')
In [28]:
spa.plots.decoder_performance(ls_setup, 'EPAD', N_sph=2)
In [29]:
ls_setup.ambisonics_setup(update_hull=True)
spa.plots.decoder_performance(ls_setup, 'ALLRAD')
In [30]:
ls_setup.ambisonics_setup(update_hull=True)
spa.plots.decoder_performance(ls_setup, 'ALLRAD2')

Binaural Decoding¶

$$ s^{l,r}(t) = x(t) * h_{\mathrm{HRIR}}^{l,r}({\Omega}, t) \quad, $$

where $(*)$ denotes the time-domain convolution operation.

Transforming to the time-frequency domain through the time-domain Fourier transform, further assuming plane-wave components $\bar X (\Omega)$, the ear input signals are given as $$ S^{l,r}(\omega) = \int_{\Omega} \bar X (\Omega, \omega) H_{nm}^{l,r}(\Omega, \omega) \,\mathrm{d}\Omega \quad. $$

$$ S^{l,r}(\omega) = \sum_{n = 0}^{N} \sum_{m = -n}^{+n} \chi_{nm}(\omega) \breve H_{nm}^{l,r}(\omega) \quad. $$

For one ear (left) this can be interpreted as a frequency dependent ambisonic beamformer $$ s^l(\omega) = \chi_{nm}(\omega) [\breve H_{nm}^{l}(\omega)]^T \quad . $$

In [ ]: